- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources1
- Resource Type
-
0000100000000000
- More
- Availability
-
10
- Author / Contributor
- Filter by Author / Creator
-
-
Chen, Siyu (1)
-
He, Jianliang (1)
-
Yang, Zhuoran (1)
-
Zhang, Fengzhuo (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
& Adams, S.G. (0)
-
& Ahmed, K. (0)
-
& Ahmed, Khadija. (0)
-
& Aina, D.K. Jr. (0)
-
& Akcil-Okan, O. (0)
-
& Akuom, D. (0)
-
& Aleven, V. (0)
-
& Andrews-Larson, C. (0)
-
& Archibald, J. (0)
-
& Arnett, N. (0)
-
& Arya, G. (0)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
In this work, we theoretically investigate why large language model (LLM)-empowered agents can solve decision-making problems in the physical world. We consider a hierarchical reinforcement learning (RL) model where the LLM Planner handles high-level task planning and the Actor performs low-level execution. Within this model, the LLM Planner operates in a partially observable Markov decision process (POMDP), iteratively generating language-based subgoals through prompting. Assuming appropriate pretraining data, we prove that the pretrained LLM Planner effectively conducts Bayesian aggregated imitation learning (BAIL) via in-context learning. We also demonstrate the need for exploration beyond the subgoals produced by BAIL, showing that naively executing these subgoals results in linear regret. To address this, we propose an ε-greedy exploration strategy for BAIL, which we prove achieves sublinear regret when pretraining error is low. Finally, we extend our theoretical framework to cases where the LLM Planner acts as a world model to infer the environment’s transition model and to multi-agent settings, facilitating coordination among multiple Actors.more » « less
An official website of the United States government

Full Text Available